Implementing a Fast Lucas-Lehmer Test on Programmable Graphics Hardware

نویسنده

Andrew Thall

چکیده

The Lucas-Lehmer test provides a deterministic algorithm for testing whether, for a prime number p, Mp = 2−1 is also a prime number. The current work demonstrates that this test can be effectively implemented on a parallel graphics processing unit (GPU). The parallelization was achieved by two main parallel methods: (1) fast multiplication using parallel Fast Fourier transforms in extended precision; (2) fast parallel carryaddition for arbitrary-precision numbers. Extended-precision is necessary in the Fourier transforms to allow single-precision graphics hardware to achieve sufficient precision for tests on non-trivial values of Mp. Methods (1) and (2) allow data to to remain on the graphics card throughout the test and minimize runtime costs of bus traffic between the host and GPU. The algorithm has been implemented in the Cg language and tested on several hardware platforms. The current work demonstrates the viability of current and future GPUs for number theoretic computation. [Addenda (2009): While actual implementations of this were not competitive with highly optimized sequential algorithms such as those used by GIMPS, a similar implementation using modern double-precision GPU hardware and CUDA kernels, rather than Cg-shaders, might produce superior runtimes to sequential algorithms.]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computation on GPUs: From a Programmable Pipeline to an Efficient Stream Processor

The recent development of graphics hardware is presenting a change in the implementation of the graphics pipeline, from a fixed set of functions, to userdeveloped special programs to be executed on a per-vertex or per-fragment basis. This programmability allows the efficient implementation of different algorithms directly on the graphics hardware. In this tutorial we will present the main techn...

متن کامل

From Behavioral to RTL Design Flow in SystemC LLR– PROSILOG scientific collaboration

This paper reports the scientific collaboration between LLR and PROSILOG. The aim of this collaboration was to show the possibility to quickly implement a system into a FPGA, using SystemC as the unique description language. Starting from behavioral abstraction level, the model, before hardware synthesis, is refined down to RTL then automatically translated to the equivalent model into VHDL or ...

متن کامل

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing pow...

متن کامل

Secure FPGA Design by Filling Unused Spaces

Nowadays there are different kinds of attacks on Field Programmable Gate Array (FPGA). As FPGAs are used in many different applications, its security becomes an important concern, especially in Internet of Things (IoT) applications. Hardware Trojan Horse (HTH) insertion is one of the major security threats that can be implemented in unused space of the FPGA. This unused space is unavoidable to ...

متن کامل

Implementing a Programmable Pixel Pipeline in FPGAs

Complex three dimensional graphics rendering is computationally very intensive process, so even the newest microprocessors cannot handle more complicated scenes in real time. Therefore to produce realistic rendering, hardware solutions are required. This paper discusses an FPGA implementation which supports programmable pixel computing.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Implementing a Fast Lucas-Lehmer Test on Programmable Graphics Hardware

نویسنده

چکیده

منابع مشابه

Computation on GPUs: From a Programmable Pipeline to an Efficient Stream Processor

From Behavioral to RTL Design Flow in SystemC LLR– PROSILOG scientific collaboration

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Secure FPGA Design by Filling Unused Spaces

Implementing a Programmable Pixel Pipeline in FPGAs

عنوان ژورنال:

اشتراک گذاری